List of Flash News about LLM jailbreak
| Time | Details |
|---|---|
|
2026-01-09 21:30 |
Anthropic Reports Classifiers Cut Claude Jailbreak Rate from 86% to 4.4% but Increase Costs and Benign Refusals; Two Attack Vectors Remain
According to @AnthropicAI, internal classifiers reduced Claude jailbreak success from 86% to 4.4%, indicating a substantial decrease in successful exploits. Source: @AnthropicAI on X, Jan 9, 2026, https://twitter.com/AnthropicAI/status/2009739654833029304 According to @AnthropicAI, the classifiers were expensive to run, impacting operational cost profiles for deployments. Source: @AnthropicAI on X, Jan 9, 2026, https://twitter.com/AnthropicAI/status/2009739654833029304 According to @AnthropicAI, the system became more likely to refuse benign requests after adding the classifiers. Source: @AnthropicAI on X, Jan 9, 2026, https://twitter.com/AnthropicAI/status/2009739654833029304 According to @AnthropicAI, despite improvements, the system remained vulnerable to two types of attacks shown in their accompanying figure. Source: @AnthropicAI on X, Jan 9, 2026, https://twitter.com/AnthropicAI/status/2009739654833029304 |
|
2025-11-13 21:35 |
AI Extended Reasoning Vulnerability: High Attack Success Rates Across GPT, Claude, Gemini Signal Trading Risk
According to the source, new research finds that extended reasoning in large language models introduces a security vulnerability with very high attack success rates. Source: the source. According to the source, models reportedly impacted include GPT, Claude, and Gemini, indicating cross-vendor exposure that traders in AI-linked crypto and equities should treat as a material security risk factor when assessing headline risk and positioning. Source: the source. |